Phase Diagrams of Quasispecies Theory with Recombination and Horizontal Gene 

Transfer 
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We consider how transfer of genetic information between individuals influences the phase diagram 
and mean fitness of both the Eigen and the parallel, or Crow-Kimura, models of evolution. In the 
absence of genetic transfer, these physical models of evolution consider the replication and point 
mutation of the genomes of independent individuals in a large population. A phase transition occurs, 
such that below a critical mutation rate an identifiable quasispecies forms. We show how transfer 
of genetic information changes the phase diagram and mean fitness and introduces metastability in 
quasispecies theory, via an analytic field theoretic mapping. 

PACS numbers: 87.10.+e, 87.15.Aa, 87.23.Kg, 02.50.-r 



We consider how quasispecies evolution changes in the 
presence of transfer of genetic information between indi- 
viduals in a population. That is, we quantify by quasis- 
pecies theory the mutational load, if any, introduced by 
a model of recombination and gene transfer. Exchange of 
genetic information between individuals is believed to be 
pervasive in nature and crucial to evolutionary dynam- 
ics (for reviews, see 0, 0, HJ). Experiments and theory 
have emphasized that recombination and gene transfer 
in various forms increase the rate of laboratory directed 
protein evolution [J, [fj, [6[ (for reviews see 0,11). Other 
experiments have amplified this point and have also sug- 
gested that, while significant in practice, the advantage 
of recombination may simply be to speed up the evolu- 
tionary process that would naturally occur by mutation 
alone in the limit of a long enough evolutionary time or 
a large enough population size (g. ToL 11 1. 



The Eigen [12j and Crow-Kimura [13| , or parallel, mod- 
els of viral quasispecies evolution are among the simplest 
that capture the basic processes of mutation, selection, 
and replication that occur in natural evolution. These 
mathematical models exhibit phase transitions, such that 
for mutation rates below critical values, an identifiable 
quasispecies forms. The Eigen and parallel quasispecies 
models are archetypes of biological evolution, and they 
have become a popular entry point to evolutionary biol- 
ogy for physicists 0, EI El O EI El, H3, EH . Quan- 
tification of the mutational load of transfer of genetic 
information has been done by numerical solutions of the 
single-mutation-per-replication Eigen model for the spe- 
cial case of a linear replication rate function [22[ . It was 
found that for intermediate population sizes and finite 
times, genetic transfer dramatically speeds up the rate 
of evolution. Phase diagrams were not determined, due 
to the focus on finite times and population sizes. We 



here derive by analytical calculation the mutational load 
and evolutionary advantage induced by transfer of ge- 
netic information for arbitrary replication rate functions 
in both the parallel and continuous-time Eigen models 
of quasispecies theory. That is, we find the infinite- 
time, infinite-genome-length, and infinite-population-size 
phase diagrams and mean fitness values of these models 
of quasispecies evolution in the general case. As an exam- 
ple, for the sharp-peak replication rate function, trans- 
fer of genetic information has two effects: sharpening of 
the population in the selected phase (u = 1 instead of 
< u < 1) and maintaining the unselected phase as 
metastable to higher growth rates. 

We model transfer of genetic information by replace- 
ment of part of the genome in a random individual with 
sequence taken at the same genomic location from a ran- 
dom parent. One common way for this process to occur in 
viruses or bacteria is by recombination. We assume that 
each of these exchanges of genetic information changes 
only one base of the sequence, such as might occur by 
homologous recombination in a stable population with 
relatively low diversity. We consider an infinite popula- 
tion of individuals with fixed genome size N, and each 
site of the genome may be in one of two states (e.g. purine 
or pyrimidine). A given sequence i reproduces at rate r^, 
and point mutations occur at rate \x per site to change 
sequence j to i. 

Transfer of genetic information occurs in the process 
in which a base at position k from any sequence j ran- 
domly replaces the base at position k in sequence i' with 
frequency v. Quasispecies theory with transfer of genetic 
information is described by the equation 



dt 



2 N 



EN v" 1 ' V 1 ' 
fc=i Li> Ljgi'gj 



vN qi 



2 



2" 

E 

3=1 



JY 



/'<;'/, 



'E(* 



fc=i 



^2™ 



(1) 



where the primes indicate that sequence j equals se- 
quence i at position k, and sequence i' equals sequence i 
except possibly at position k. The notation i' = <j\(k)i 
indicates the sequence i' that results from changing base 
k in sequence i. We have defined qi to be propor- 
tional to the probability of sequence i in the population. 
We note that the recombination term conserves parti- 
cle number: taking the sum of Eq. (TTJ) over i causes the 
mutation and recombination terms to cancel. We de- 
fine the average spin at position k at time t as u(k), 
u( k ) = EjO^.+i - 5 4,-i)%'/E m <?m, where s{ repre- 
sents the base at position k of sequence j. The recombi- 
nation process is, thus, described by 
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We analyze this equation at long times, when u be- 
comes independent of both time and position. In this 
limit, the non-linear Eq. ((2|) becomes a linear equation, 
with a self-consistency condition for u. We find the long- 
time solution by a mapping to a two-component field 
theory, using the procedure of [I3| ■ We define the space 
= Ei I s *)- The evolution equation is then 

cast as 31 IV 7 ) = — -^IV'}- The space is represented in terms 
of creation and annihilation operators. We seek the op- 
erator form of Eq. ([1]). Each spin state, \s l n ) = | + 1), 
| — f), is created by a different operator. 

We introduce two pairs of creation and annihilation 
operators for the space, with the constraint that at each 
position one and only one particle is present. We label the 

creation operators at position j as a (j) = (a\(j), a\{j)). 
The Hamiltonian is given by 



H = n^2[M{j) - 1] + vNR 
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where fj, is the mutation rate, M(n) mutates s l at po- 
sition n, N f(y,i) — Ti is the replication rate, vN is the 
recombination rate, and R is the recombination opera- 
tor. The mutation operator M(j) operates on site j and 

are defined by M(j) = a (j) ■ a\d{j). The recombination 



operator is given by 



R = 
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The formal solution is \ip(t)) = e |^(0)), which im- 
plies that the joint probability distribution at time t 
is given by #j(t) = {si\e~ Ht J2i 9i(0)l s ')- We introduce 
the coherent state representation \z) = e a ' z ~ z a |0,0), 
where z — {z\,z%) is a two- vector. The coherent states 
satisfy a completeness relation / = j[Dz*T>z\\z)(z\. 
The overlap of the coherent states is given by (z'\z) = 
e -W*-(z-z)-(z'-r)-z\^ Tjgfng [l3], we use the Trotter 
factorization to find the evolution operator is 
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where e = t/M. 

The probability to go from an initial state <?(i) =7 to 
a final state g^A—y, where 1 < < 2 indicates the 
composition of the pair of bases at position j, is 
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where Q(j) = TltLii 1 + sB k {j)}, with B k (j) = fx((7i 

/ i+ u i+ u 

I) + v{D - I) + £> 3 , and D = ^ ^ 

V 2 2 

We are interested in the probability distribution at 
long times, which for a given u grows as e^ mt by the 
Perron- Frobenius theorem, where f m is the largest eigen- 
value of —if, and / m is equal to the mean replication rate 
at long times [l7j ]. when u is self-consistently determined. 
We evaluate the contribution to this eigenvalue from Q(j) 
by considering the expression Tr Q(j). We find 



InTr Q(j) ~ t J (/i + ^/2) 2 + + g - A* - "/2 -(7) 



We note that in the limit of infinite v, InTr Q(j) ~ 

tit^c- The mean replication rate is given by f m = 

max ?c^c{/(^ c ) _ £ c £ c + [ mTr QO')]/*}- Maximizing over 
£ c , we find 
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The observable surface magnetization, u, is given by the 
implicit self-consistency condition /(it) = / m . Thus, the 
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two variables £ c and u need to be determined when solv- 
ing Eq. ([8]). This procedure provides the exact solution 
to the parallel model of recombination for a general repli- 
cation rate function. 

To illustrate how recombination affects the error- 
threshold phase transition, we calculate the error thresh- 
old for three different replication rate functions. We first 
consider in detail /(l) = A and / = otherwise. Eq. ([8|) 
is maximized at £ c = 1 or £ c = 0. The error threshold is 
given for u — by A > fi+u/2. The self-consistency con- 
dition f m = f{u) can only be satisfied by u = \—0(l/N). 
Thus, due to the non-linearity, u = 1 in the selected 
phase, in contrast to the case without recombination, for 
which u = 1 — fi/A [l3|. In the selected phase / m = A — [i. 
Thus, the true error threshold is A > fi, with A > /i+z//2 
the limit of metastability for initial conditions with u ~ 0. 
If we are in the selected phase and reduce the replication 
rate of the sharp peak, we transform to the unselected 
phase at the solid line of Figure [T] If, however, we are 
in the unselected phase and increase the replication rate 
of the sharp peak, we may not transform to the selected 
phase until reaching the short-dashed line of Figure [T] 
We next consider in detail the case of the quadratic repli- 
cation rate f(u) — ku 2 /2. By setting Eq. ((HJ equal to 
f(u) for small u, we find that the error threshold is given 
by k > (fi + v)/[l + v /(2fj,)]. At small v, recombination 
has again shifted the transition by +v/2. As an example 
of general parameter values, for v = 1, ji = 1, and k = 2, 
we find u = 0.4671 and f m = 0.2182. Solving Eq. (TT|) 
numerically for N = 10 3 , v = 1, /i = 1, and k — 2, we 
find u = 0.4662 and / m = (/(uj)) = 0.2183. Note re- 
combination introduces a genetic load for the quadratic 
replication rate, since in the absence of recombination 
U = 1 — fi/k =1/2 for these parameters. Note also that 
metastability does not occur for the quadratic replica- 
tion rate. The error-threshold phase diagram for these 
two cases is shown in Figure [TJ We finally consider in 
detail the linear fitness /(£) = fco + fc£- We find that for 
all values of fco, k > 0, /i > 0, and v > 0, the optimal 
value of £ c is positive. Thus, the selected phase always 
occurs 

We now turn to consider recombination in the Eigen 
model. In the Eigen model, when a virus reproduces, 
the virus copies its genome, making mutations at a rate 
of 1 — y per base per replication. The un-normalized 
probability distribution in genome space satisfies 
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dt 
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The degradation rate is defined analogously to the repli- 
cation rate by Di = Ndivn). Here the transition rates 
are given by = y N - d tt>3){\ - We define the 

parameter /i = N(l — y)/y to characterize the per genome 
replication rate, where we take fi = O(l), consistent with 
observed mutation rates in many viruses and bacteria 
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FIG. 1: The selected phase, in which a finite fraction of the 
population has a non-zero magnetization, is shown for the par- 
allel model with recombination. The rate of genetic exchange 
is p, and the rate of mutation is /i. The phase diagram is 
shown for the replication rate f(u) — A5 Ut i (solid line, with 
short-dashed line the metastable limit) and the replication 
rate /(it) = ku 2 /2 (dashed line). Also shown is the phase 
diagram for the Eigen model for f(u) — (A — Ao)S U: i + Aq 
(solid line, with short-dashed line the metastable limit) and 
/(it) = ku 2 /2 + fco (long-dashed line) [ordinate coordinates in 
brackets] . 
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<W)j>fc) (^TT^.+i + L -T^\-i) • The rate of ge- 
netic exchange per site is v/N . This equation is gener- 
ated to O(N ) by the Hamiltonian 
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We define Xk = jjY,f=i S \j) 4 D ^U) and Vk = 
Y^,f=i -^(iVi^fc-iO)- We integrate out the z field, to 
find Bk(j) = f]k<Ji+£,kO'3 + XkD. The action is, therefore, 
given to O(N ) by 
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AHnTr Q(j) ,(11) 



where InTr Q(j) ~ t(y/(fj c + X c/2) 2 + ux£ c + &) + Xc/2. 
Note that the probability of any of the N bases un- 
dergoing both mutation and recombination is 0(v[a/N). 



We have / m = max 
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d (Cc) ~ £c?c - VcVc - XcXc + [InTr Q(j)]/t}. Maximiz- 
ing over £, c ,fj c , and % c , we find that fj c i] c + XcXc + £c£c = 
(In q)/t. We use the additional relation \i\ c — vfj c to find 
rio{Q = {(l-a/[l-^7(2M + ^) 2 ]} 1/2 and xdic) = 
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1 /2 

<c - u (rfc + - l) / 2 - Tne mean re P n " 
cation rate is given by the expression 

f m = maxfe-^-^/fc)-^)} . (12) 

Sc 

The observable u is given implicitly by / m = /(it) — 

To illustrate how recombination during replication af- 
fects the error-threshold phase transition, we calculate 
the error threshold for three different replication rate 
functions. We first consider in detail /(l) = A and 
f = Aq otherwise. Eq. (fT2|) is maximized at £ c = 1 
or £ c = 0. The error threshold is given for u — by the 
equation Ae _At_!/ ' 2 > Aq. Due to the non-linearity, u = 1 
in the selected phase, in contrast to the case without re- 
combination where u = (Ae~>* - A )/(A - Aq) [17]. The 
mean replication rate is given by / m = Ae~^ . Thus, the 
true error threshold is Ae^^ > A , with Ae~^~ v / 2 > A 
the limit of metastability for initial conditions with u ~ 0. 
The limits of the bistable region, which is reminiscent of 
the bistable region found numerically in a related Eigen 
model with a type of recombination 24|, are demarked 
by the solid and short-dashed lines in in Figure Q] For 
our second example, we consider the quadratic fitness 
/(0 = k + fc£ 2 /2. By setting Eq. ^ equal to f(u) 
for small u, we find that the error threshold is given by 
k > ko(fi+u)/[l+i'/(2fi)]. At small v, recombination has 
again shifted by transition by +v/2. The error-threshold 
phase diagram for the sharp peak and quadratic replica- 
tion rate cases is shown in Figured] For our third exam- 
ple, we consider in detail the linear fitness /(£) = ko + k£. 
We find that for all values of A; > 0, fi > 0, and v > 0, 
the optimal value of £ c is positive, and the selected phase 
always occurs. 

How is it that recombination changes the phase di- 
agram or mean fitness? After all, this process sim- 
ply replaces an allele with another allele randomly cho- 
sen from the distribution of alleles at that site. This 
process, however, reduces the correlations between the 
composition of alleles at different sites in the sequence. 
These correlations are non-zero 17[, and so their reduc- 



tion changes the dynamics and the steady-state distribu- 
tion. Recombination sharpens the phase transition for 
/(£) = /(0) + S^.jAA, turning it into step function for 
u. More generally, recombination propagates favorable 
mutations throughout the population, thereby typically 
increasing the rate of evolution. 

We may alternatively interpret each site in our model 
as a coarse-grained representation of an allele or gene, 
each with only two states that may be changed either by 
mutation or gene transfer. We may also consider that the 
genome has been ordered so that all the mobile genetic 
elements are collected, say, at the end. If the genome 



can be considered approximately constant in length and 
if the replication rate can be approximately expressed by 
the quasispecies assumption n = Nf(iii), then the model 
we have discussed is a representation of horizontal gene 
transfer in a population of evolving bacterial species, be- 
cause homology is not a prerequisite for horizontal gene 
transfer in nature or our model. The diversity in the 
population represents the species diversity in a bacterial 
order, family, or genus. As with natural gene transfer by 
mobile elements, our model docs not assume homology is 
required. Our results make a couple generic predictions: 
1) for a sharp peak fitness, such as might be induced by a 
novel antibiotic, a population with horizontal gene trans- 
fer tends to be more uniformly resistant (u ~ 1) than is 
a population without (0 < u < 1), 2) while the contri- 
butions of mutation and horizontal gene transfer to the 
mean fitness are not identical, in Eq. ([8]) or (Tl"2"l) , they 
are similar for smooth replication rates. Horizontal gene 
transfer tends to incorporate alleles with new function, 
whereas mutation tends to adapt existing alleles for im- 
proved function. The observed rates of horizontal gene 
transfer and mutation might be expected to be the same 
order of magnitude, therefore, to balance the resources 
expended on the complementary tasks of large-scale evo- 
lution and local adaptation. As an example, we consider 
the evolution of E. coli from Salmonella. The rate of 
evolution due to horizontal gene transfer is estimated to 
be 16,000 bases/million years, while that due to point 
mutation, is 22,000 bases/million years [25|. These are 
observed rates, and so selection plays a role. The un- 
derlying rate of horizontal gene transfer in E. coli has 
been estimated to be about 10 -6 genes per cell per repli- 
cation 26], which corresponds to a change of roughly 
10~ 3 bases per sequence per replication, given the aver- 
age E. coli gene length of 10 3 bases. Taking the typical 
underlying E. coli mutation rate of 5 x 10~ 10 per base 
per replication [23| and noting that the E. coli genome 
length is w 5 x 10 6 bases, we find that point mutation 
modifies approximately 2.5 x 10 -3 bases per sequence per 
replication. Interestingly, this same equality of underly- 
ing horizontal gene transfer and point mutation rates per 
base per replication is also observed in quantitative mod- 
els of laboratory directed protein evolution optimized for 
evolutionary rate Q . 
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